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Abstract —QoS-based Web service recommendation has re¬ 
cently gained much attention for providing a promising way to 
help users find high-quality services. To facilitate such recommen¬ 
dations, existing studies suggest the use of collaborative filtering 
techniques for personalized QoS prediction. These approaches, 
by leveraging partially observed QoS values from users, can 
achieve high accuracy of QoS predictions on the unobserved ones. 
However, the requirement to collect users’ QoS data likely puts 
user privacy at risk, thus making them unwilling to contribute 
their usage data to a Web service recommender system. As a 
result, privacy becomes a critical challenge in developing practical 
Web service recommender systems. In this paper, we make the 
first attempt to cope with the privacy concerns for Web service 
recommendation. Specifically, we propose a simple yet effective 
privacy-preserving framework by applying data obfuscation tech¬ 
niques, and further develop two representative privacy-preserving 
QoS prediction approaches under this framework. Evaluation 
results from a publicly-available QoS dataset of real-world Web 
services demonstrate the feasibility and effectiveness of our 
privacy-preserving QoS prediction approaches. We believe our 
work can serve as a good starting point to inspire more research 
efforts on privacy-preserving Web service recommendation. 

Keywords—Web service recommendation; QoS prediction; col¬ 
laborative filtering; privacy preservation 

I. Introduction 

Web services are self-contained units of software function¬ 
alities (e.g., retrieving currency exchange rates) delivered over 
the Internet for users to build composite Web applications. 
Recent advances in cloud computing enable on-demand service 
delivery and promote the rapid growth of service markets, 
where more and more Web services are expected to become 
available. Whereas the abundance of Web services meets 
the various needs of different users (i.e., Web application 
providers), it also poses a significant challenge in selecting 
among a large number of similar services [[I]|. In this context, 
Web service recommendation [(2), |13|, |4| that aims to help 
users quickly find desirable services has become a hot research 
issue in the area of service computing in recent years. 

Effective service recommendation needs to fulfil both 
functional and non-functional requirements of users. While 
functional requirements focus on what a service does, non¬ 
functional requirements are concerned with the quality of 
service (QoS), such as response time, throughput, and failure 
probability, etc. QoS plays an important role in Web service 
recommendation, according to which similar services can be 
ranked and selected for users. Service invocations usually rely 
on the Internet for connectivity and are heavily influenced by 
the dynamic network conditions. Therefore, users at different 
locations typically observe different QoS values even on 


the same Web service. To enable personalized Web service 
recommendation, QoS evaluation from user side is desired. 
However, it is a challenge to acquire user-perceived QoS 
values of all the services because each user only has observed 
QoS values on a few used services. It is also impractical for 
each user to actively measure these QoS values due to the 
expensive overhead of invoking a large number of services. 

To address this issue, collaborative QoS prediction has 
recently been proposed, and becomes a key step to QoS-based 
Web service recommendation. By applying collaborative fil¬ 
tering (CE) techniques [[5) that are widely used in commercial 
recommender systems, unknown QoS can be predicted based 
on historical usage data collected from users, eliminating the 
need of additional service invocations. In other words, users 
can contribute their historical QoS data on the services they 
have used and receive prediction results on the QoS values of 
the services that they have never used before. In recent litera¬ 
ture, a number of collaborative filtering approaches have been 
proposed for QoS prediction. Among them, neighbourhood- 
based CE approaches (e.g., UIPCC |1^) leverage the similarity 
between users and/or the similarity between services calculated 
on the observed QoS data for unknown QoS prediction. Model- 
based approaches (e.g., PME O, EME OTJ) fit the observed 
QoS data with a pre-defined model (e.g., low-rank matrix 
factorization), and then utilize the trained model for QoS 
prediction. Recent studies have shown that these approaches 
achieve high accuracy of QoS predictions and yield encourag¬ 
ing results on Web service recommendation. 

Despite the potential benefits provided by Web service 
recommender systems, a major impediment to the practical 
deployment of such systems lies in their threats to user privacy. 
To receive effective recommendations, users are required to 
supply their observed QoS values. However, there is currently 
no policy to protect users from privacy issues. Malicious 
recommender systems, for example, may abuse the data, infer 
private information from the data, or even resell the data to a 
competing user for profits (8j|. Even if the recommender system 
is not malicious, an unintentional leakage of such data can ex¬ 
pose users to a broad set of privacy issues (e.g., QoS data may 
reveal the underlying application configurations). This is why 
application providers are not willing to disclose their private 
usage data to the public or a third party. Such privacy threats 
limit the QoS data collection from users and hence degrade 
the accuracy of Web service recommendation. To encourage 
broader user participation, it is desired to consider privacy¬ 
preserving approaches for Web service recommendation that 
can be made without revealing private user data. Encryption is 
a straightforward way to achieve privacy. However, encryption 
techniques usually involve large computational overhead and 
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typically work for distributed collaborative filtering problems 
(e.g., homomorphic encryption used in where multi¬ 
party communication is necessary. This is inapplicable to our 
problem because user-user communication is infeasible. 

In this paper, we propose a simple yet effective privacy¬ 
preserving framework for QoS-based Web service recom¬ 
mendation. Specifically, users are enabled to obfuscate their 
private data by data randomization techniques Col before they 
expose the data to a recommender system. In this way, the 
recommender system can only collect obfuscated QoS data 
from users, and thus reduce the risk to expose user privacy. 
Our privacy-preserving framework is generic and can be 
applied to both the neighbourhood-based collaborative filtering 
approach, i.e., UIPCC ifTTl . and the model-based collaborative 
filtering approach, i.e., PMF which are two most common 
QoS prediction approaches in recent literature. We further 
revamp these two existing QoS prediction approaches based 
on our framework, and develop their corresponding privacy¬ 
preserving variants: P-UIPCC and P-PMF. We evaluate these 
approaches on WS-DREAM dataset ifT^ . a publicly-available 
QoS dataset that has been widely employed for QoS prediction 
evaluation in the literature. The experimental results show that 
while preserving user privacy, our proposed approaches (P- 
UIPCC and P-PMF) can still attain decent prediction accuracy 
with comparision to the baseline approaches (UEAN and 
IMEAN) and the counterpart approaches (UIPCC and PME). 
We also show the tradeoff between the achieved prediction 
accuracy and the preserved user privacy. Eor reproducibility, 
we release the source code and detailed evaluation results on 
our project pag^ 

In summary, our paper makes the following contributions: 

• This is the first work to cope with the privacy concerns 
for QoS-based Web service recommendation. 

• We propose a simple yet effective privacy-preserving 
framework, and further develop two representative 
privacy-preserving QoS prediction approaches, 
P-UIPCC and P-PME, under this framework. 

• We conduct experiments on a real-world large-scale 
QoS dataset of Web services to evaluate the effective¬ 
ness of privacy-preserving QoS prediction approaches. 
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Fig. 1. An Illustrative Example of QoS Prediction 


viding a promising way to help users select high-quality 
services out of all the candidate services according to the user- 
perceived QoS values. Because it is prohibitively expensive 
or even infeasible for a user to acquire all the QoS values 
of the candidate services, the key of QoS-based Web service 
recommendation is to enable accurate QoS predictions. 

Collaborative filtering (CP) lU has been widely used in 
commercial recommender systems (e.g., movie recommenda¬ 
tion in Netflix, item recommendation in Amazon) for rating 
prediction, where the observed user ratings are leveraged to 
learn user preferences on the unrated movies or items and 
further make predictions on the unknown ratings. In recent 
literature, CP has been suggested as a promising approach 
to QoS prediction (e.g., |^, ([H, [41). As with the user- 
movie rating matrix collected in a movie recommender system, 
users invoking services can produce a user-service QoS matrix 
with respect to each QoS attribute. We denote a QoS matrix 
by R, whose entry Rij represents the observed QoS value 
(e.g., response time) of user Ui invoking service Sj. Pig. [Hb) 
illustrates a QoS matrix with four users (ui, ..., U 4 ) and five 
services (si, ..., S5), produced by the user-service invocation 
graph in Pig. [^a). In practice, the QoS matrix is very sparse 
(i.e., most of the entries are unknown), since each user usually 
invokes only a few services. As shown in Pig. [^b), the grey 
entries are observed QoS values (e.g., Rn — 1.4) and the 
blank entries are unknown QoS values (e.g., R 12 — ?). As 
a result, the QoS prediction problem can be modelled as a 
collaborative filtering problem. Pig. [^c) shows the predicted 
QoS matrix from the observed QoS matrix in Pig. [^b), where 
the unknown values are approximately reconstructed. 

Specifically, two types of CP approaches have been studied 
for QoS prediction of Web services in recent literature: 


The remainder of this paper is organized as follows. Sec¬ 
tion [^introduces the background and related work. Section |Tn| 
presents the framework of privacy-preserving Web service rec¬ 
ommendation. Then we describe the detailed QoS prediction 
approaches in Section IV and report the evaluation results in 
Section jV] Pinally, we conclude this paper in Section jV^ 


II. Background and Related Work 

In this section, we introduce the background of QoS-based 
Web service recommendation and review two representative 
collaborative filtering approaches used for QoS prediction. We 
then discuss the privacy issues and the key techniques for 
privacy preservation in related work. 


A. QoS-based Web Service Recommendation 

QoS-based Web service recommendation has recently at¬ 
tracted much attention from the service community, for pro- 

' http://wsdream.github.io/PPCE 


7 ) Neighbourhood-based Collaborative Filtering: This 
type of CP approaches use the observed QoS data to compute 
the similarity values between users or services, and further 
leverage them for QoS prediction. Typical examples include 
user-based approaches (e.g., UPCC lEl) that leverage the 
QoS information of similar users for prediction, item-based 
approaches (e.g., IPCC HI) that employ the QoS information 
of similar items (i.e., services) for prediction, and their 
hybrids (e.g., UIPCC fTTI ) that combine user-based and 
item-based approaches together for accuracy improvement. 
These approaches are easy to implement, but they fail to deal 
with the data sparsity problem, which limits their performance 
in practice. 

2 ) Model-based Collaborative Filtering: Model-based CP 
approaches provide a predefined model to fit the observed QoS 
data, and then the trained model can be used to predict the 
unknown QoS values. Matrix factorization (e.g., PMP El) 
is one of the most popular model-based CP approaches, 
which was first introduced to address the QoS prediction 





















problem in |l2l. Matrix factorization model handles the sparsity 
problem well and usually achieves better performance than 
neighbourhood-based approaches. 

In this paper, we mainly look into two QoS prediction 
approaches, UIPCC [HI and PMF They are representatives 
of the two types of CF approaches respectively and serve as 
a basis to develop many more sophisticated approaches. For 
example, some studies such as CloudPred I^, NIMF 1(6-1, 
and LN-LFM ifTTI integrate neighbourhood-based and model- 
based CF approaches, while some others suggest to leverage 
additional context information such as location information |4^| 
and time information m, m for improving prediction 
accuracy. Our work focuses on providing a privacy-preserving 
QoS prediction framework. Therefore, the studies on how to 
build more sophisticated models for accuracy improvement are 
orthogonal to our work and fall outside the scope of this paper. 

B. Privacy Issues 

Privacy is an important issue that has raised particular 
concerns among many research areas. In the following, we 
review the privacy studies related to our work. 

1) Privacy in Service Computing: In service computing, 
applications are typically built by composing Web services 
offered by different service providers. User information often 
needs to be shared across the providers to fulfil an overall 
application task. This can raise privacy issues between users 
and service providers when the selected Web services for 
composition have privacy policies that are not compliant with 
users’ privacy requirements. In this regard, privacy-aware Web 
service selection and composition (e.g., IM, lUD, imi) 
have been studied. For example, Costante et al. ll^ propose 
an approach to rank the candidate Web services with respect to 
the privacy level they offer. Tbahriti et al. ll^ further provide a 
mechanism to verify and negotiate privacy constraints between 
users and service providers to enable privacy-compatible ser¬ 
vice composition. Different from these studies, our work aims 
to address privacy issues for Web service recommendation. 

2 ) Privacy in Recommender Systems: In recommender 
systems flM . users want to gain useful recommendations 
without compromising their privacy. To achieve so, a variety of 
privacy-preserving collaborative filtering approaches ll^ have 
been proposed by using techniques such as randomization m, 
cryptography [(^, anonymization ll26l . and so on. Privacy is 
also of vital importance to the realization of QoS-based Web 
service recommendation, where users might not be willing to 
disclose their private usage data. However, there is currently a 
lack of studies on how to cope with the privacy issues for 
QoS-based Web service recommendation. Existing privacy¬ 
preserving CF approaches are not directly applicable because 
of the unique challenges posed by Web service recommenda¬ 
tion. For example, most of these approaches (e.g., ilZ/l , Il9i 
Il28l ) require multi-party or peer-to-peer collaboration between 
users, which is inapplicable to service users. To bridge this gap, 
our paper makes the first attempt to build a privacy-preserving 
QoS prediction framework for Web service recommendation. 

III. Framework of Privacy-Preserving Web 
Service Recommendation 
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Fig. 2. Framework of Privacy-Preserving Web Service Recommendation 


be separated into two parts executed at user side and server 
side respectively. At user side, the observed QoS data of each 
user undergo a data obfuscation process in order to protect 
user privacy as well as preserve the information required for 
performing collaborative QoS prediction. The obfuscated user 
data are then submitted to the server for QoS prediction. 
After receiving the prediction results from the server, a post¬ 
processing step is performed to recover the obfuscated results 
to the true QoS prediction values. At last, according to the 
recovered QoS values, candidate Web services can be ranked 
and recommended for the user. On the other hand, at server 
side, obfuscated QoS data are collected from different users in 
a collaborative way, through which a obfuscated QoS matrix 
can be acquired and stored in a QoS database. QoS prediction 
is then performed on the obfuscated QoS matrix by using our 
proposed privacy-preserving techniques such as P-UIPCC and 
P-PMF. At the same time, a list of the available Web services is 
maintained at a Web service database, which allows for service 
ranking and recommendation for the users. 

User privacy is preserved by our framework because: 1) 
For each user, user data are obfuscated before being submitted 
to the server, and the obfuscation settings are only known to 
the user itself; 2) For the server, collaborative QoS prediction 
is performed based solely on the obfuscated user data, whereby 
user-observed real QoS values cannot be inferred. In this 
way, our framework enables users with greater control on 
their private usage data and less dependence on the server 
for privacy preservation. The privacy-preserving framework is 
generic such that both of the representative QoS prediction 
approaches (i.e., UIPCC and PMF) can work well without the 
need of significant modifications. 

IV. QoS Prediction Approach 

The above framework enables data obfuscation for pre¬ 
serving privacy, but also poses a challenge in accurate QoS 
prediction. In this section, we describe the data obfuscation 
process in detail, and then extend two representative QoS 
prediction approaches (UIPCC and PMF) into their privacy¬ 
preserving variants (P-UIPCC and P-PMF) accordingly. 

A. Data Obfuscation 

The need for privacy preservation has led to the devel¬ 
opment of a number of data obfuscation techniques, such as 
data randomization Eol, data encryption H), data anonymiza¬ 
tion iH^. Due to the sparse nature of our data, in this paper, 
we make use of data randomization Ea, a simple yet effective 
way to obfuscate the data. 


Fig. [^presents our privacy-preserving Web service recom¬ 
mendation framework. The workflow of this framework can 


The basic idea of data randomization is to add a random 
value (i.e., noise) to the true value so that the resulting value 

























becomes disguised. In this way, when the obfuscated QoS 
data undergo further processing, user information regarding 
real QoS values can be preserved. Fortunately, although each 
individual QoS value becomes disguised, we find that some 
approximate computations (e.g., scalar product) on the aggre¬ 
gated data of users can still be done with decent accuracy. 

To make it clear, we now describe the scalar product prop¬ 
erty of data randomization (H in detail. Let a — (ai, ...,an) 
and b — {bi,...,bn) be true vectors with a mean of zero. 
We obfuscate these vectors as a' — a + e and b' — b + 6, 
where e = {en,---,en) and 5 — (Sn,---,bn) are random 
noises generated from a uniform distribution in [—a, a]. Next, 
we show that the scalar product between a and b can be 
approximated by using the obfuscated vectors a' and b'\ i.e., 
a'b' ab. To this end, we have 

n n 

a'b' — 'y + ^i){bi -f Si) — y y ajbj -f aiSi -f biei -f 
i=l i—1 

Because a and S are independent vectors and each has a 
zero mean, we have Y^^^^aiSi ~ 0. Likewise, we have 
~ ~ 0- Hence, we derive the 

following approximation: 

n 

ah' ~ Qibi = ah. (1) 

With this observation, we find that data randomization can 
potentially preserve user privacy as well as the usability of 
the data for collaborative analysis. Therefore, it is appealing to 
study how to apply this data obfuscating technique to perform¬ 
ing collaborative QoS prediction in a privacy-preserving way. 
To achieve this goal, we propose a two-step data obfuscation 
procedure for QoS data processing. We emphasize that, as 
shown in our framework in Fig. each user performs data 
obfuscation individually at user side before contributing the 
QoS data to the server. 

1) Z-score normalization: To facilitate better randomiza¬ 
tion of the data, we perform z-score normalization on the 
observed QoS data as the first step. Z-score normalization is 
a standard normalization method to adjust the data average 
and data variance. The normalized data have a zero mean 
and unit variance. More specifically, for user u, we denote 
Ru — {Ruii ■■■,Rum) as a vector of observed QoS values on 
m Web services. Rus > 0 indicates that user u has invoked 
service s; otherwise, Rus — 0. We compute the mean (Ru) 
and standard deviation (au) of this QoS vector Ru'. 

Ru = y2 r cru = jy2 (Rus-Ru)V\Iu\, ( 2 ) 

where = {s | Rus >0} denotes the set of Web services 
that has been invoked by user u. Then z-score normalization 
is performed on the QoS values with the following equation: 

rus = {Rus - Ru)lau. ( 3 ) 

The normalization step results in a zero-mean data vector that 
is well suited for the following data randomization process. 

2) Data Randomization: As the second step, we perform 
randomized perturbation on the normalized QoS vector by: 

r'us=rus+eus, (4) 

where £us is a random value generated from a specified 
distribution, for example, uniform distribution in [—a, a]. 


Especially when a = 0, the overall data obfuscation process 
reduces to a z-score normalization. We further study the effect 
of different distributions (e.g., uniform distribution, Gaussian 
distribu tion) of random noises on QoS prediction accuracy in 
Section IV-EI 

After data obfuscation, users can submit their obfuscated 
QoS data to the server. Given n users and m services, the 
server can collect a QoS matrix denoted as r' G ]^nxm 
with each entry (r^^^) being obtained via Equ. 0. Since such 
data obfuscation process is performed at user side, the private 
information such as Ru and au are kept at user side. As a 
result, the server cannot infer the true QoS values of the users, 
and user privacy is preserved. 

Next, we will show how we extend the two representative 
approaches (UIPCC and PME) to perform privacy-preserving 
QoS prediction based on the obfuscated QoS matrix r'. Note 
that UIPCC and PME have been carefully reported in the 
related work IfTTl . O, so we do not intend to provide the 
original descriptions but the necessary extensions from them. 


B. Privacy-Preserving UIPCC (P-UIPCC) 

UIPCC (a.k.a. WSRec), first proposed in [|2l, has been a 
widely-studied QoS prediction approach. The key of UIPCC 
is to compute the similarity between users and the similarity 
between services, after which QoS values contributed by 
similar users and similar services can be leveraged to compute 
the prediction value. Existing work usually employ Pearson 
correlation coefficient (PCC) as the similarity measure. Eor 
example, the PCC similarity between user u and user v is 
defined as follows: 


sim{u, v) 


Y)s£J “ Ru){Rvs — Rv) 

\J'^s£j{R'>^^ ~ Ru)‘^ 


(5) 


where J = Pi is the set of Web services that are invoked 
by both user u and user v._ Rus is the true QoS value of 
user u invoking service s. Ru and Ry are the average QoS 
values observed by user u and user v, respectively. Prom this 
definition, we have sim{u,v) G [—1,1], where a larger PCC 
value indicates higher user similarity. 


However, due to the obfuscation of QoS data, at server 
side we only have obfuscated QoS value rather than 
its true value Rus- Therefore, we consider to employ 
to approximately compute the similarity value sim{u, v) as 
follows: 


sim{u, v) — 
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seluHL 
'^usf’vs 
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YYs^IufMv Ru){Rvs Ry) 

a^u^v s /1 1u I I I 

YYj s^IuC\Iv Ru){Rvs Ry 


EseiuiRus - Ru)^JY:seiSRvs - Rv)^ 


(6) 

(7) 

( 8 ) 
(9) 


By applying the scalar product property in Equ. Q to Equ. 
( 61 , substituting Equ. Q to Equ.fT]), and substituting Equ. 
(2 1 to Equ. ([^, we derive Equ. which is exactly the 
similarity measure used for collaborative filtering in the related 
work d, Eo). Note that this similarity measure differs 

















slightly from Equ. ([^ in the denominator part, but provides 
a good approximation to it (as the experiments shown in 
Section 0- Therefore, by using the obfuscated QoS data, we 
employ Equ. as the approximation of the similarity between 
user u and v. 


After similarity computation between users, we can identify 
a set of top-k similar neighbours {Tu) for each user u. Then 
the unknown QoS value, for each entry where — 0, can be 
estimated as the weighted average of the QoS values observed 
by similar neighbours, i.e., 

E. • v)r^s j sim{u, v). (10) 


In a similar way, we can also leverage the information of 
similar services to make QoS prediction: 


ffs = sim{s,g)r'ug! sim{s,g), (11) 

where Tg is the set of top-k similar services of service s. 
The similarity sim{s,g) is further calculated by employing 
the cosine similarity between service s and service g: 


sim{s, g) 




£lsr\Ia ' US' ug 




Gisnin 


( 12 ) 


where Ig^ilg represents the set of users that invoke both service 
s and service g. Note that the cosine similarity here equals to 
the original PCC similarity in UIPCC, because the QoS vectors 
have already been normalized during data obfuscation. 


At last, as with UIPCC, a convex combination between 
user-based QoS prediction and service-based QoS prediction 
is employed to enhance the prediction accuracy. 

fns = -h (1 - A)ffg, (13) 

where A controls the combination weight between and 
ffg. Especially, when A = 0, fus — when A = 1, 

rf - rp 

•us • US' 


Eormally, we denote the latent user factors as (7 G 
whose u-th column represents the latent factor of user u, and 
the latent service factors as S whose s-th column 

represents the latent factor of service s. Accordingly, we use 
C/J Ss to approximate the observed QoS value Rus between 
user u and service s, i.e., Rus ~ U'^Ss, or more precisely, 

Rus = Uu Ss Sus, ( 15 ) 

where f/J is the transpose of Uu and Sus denotes the approxi¬ 
mation error. The goal is to minimize ah of the approximation 
errors. By taking Sus as Gaussian noise na, the loss function 
can be formulated as follows: 

^ n m n m 

^ i E + i(E +E 

^ U=1 S = 1 U = 1 S = 1 

( 16 ) 

The hrst part measures the sum of squared approximation 
errors between Rus and f/J Ss, where lus acts as an indicator 
that equals to 1 if Rus is observed, and 0 otherwise. The second 
part are regularization terms used to avoid the overhtting 
problem, where H-H denotes the Euclidean norm, and 7 is a 
parameter to control the extent of regularization. 

According to the basic PME model as specihed in Equ. 

the specihc QoS of user u invoking service s can 
be effectively captured by the interaction between Uu and 
service Ss- However, some other effects known as biases 
for determining the QoS values are independent of user- 
service interactions. Eor example, the users with high network 
bandwidth tend to experience fast network connections and 
the services equipped with abundant system resources likely 
provide short request-processing time. To capture these factors 
associated with either users or services, there is a suggestion 
for biased matrix factorization model in ifTTl : 

Rus = g bu bs Uu Ss Sus, ( 17 ) 

where /x is a global bias, and bu and bs measure the user bias 
and service bias respectively. 


However, this prediction result fus is a normalized value 
that cannot reveal the prediction on the true QoS. When the 
user receives the prediction results from the server, a post¬ 
processing step, which is a re-normalization operation of the 
z-score normalization, can be taken to get the hnal prediction 
value Rus'- 

Rus = Ru + CTu * fus- ( 14 ) 

Note that the post-processing step can be only performed at 
user side because Ru and are only known to the user. 

C. Privacy-Preserving PMF (P-PMF) 

PME, or probabilistic matrix factorization ca, as a popular 
model-based collaborative hltering approach, has been sug¬ 
gested for QoS prediction by prior work |IT|, 1^. PME works 
on an essential assumption of the low-rank structure of the 
QoS matrix. A matrix has a low rank when the entries of 
the matrix are largely correlated. In our case, as reported by 
the related work ca. la. similar users usually have similar 
QoS values on the same Web service. The goal of PME is 
to map n users and m services into a joint latent factor 
space with dimensionality d such that each observed entry of 
the QoS matrix can be captured as the inner product of the 
corresponding latent factors. 


While preserving user privacy, the application of data 
obfuscation poses new challenges in modelling the obfuscated 
QoS data. To compromise the effect of data obfuscation, we 
set g — 0 and bu — Ru- Accordingly, we derive the following 
model: 

l^us = bs Uu Sg Sus + ^US- ( 18 ) 

Eor ease of presentation, we further denote it as: 

'^us = bs Uu Ss Sus + Cus- ( 19 ) 

This model naturally compromise the effect of z-score normal¬ 
ization at user side. By taking both Sus and as Gaussian 
noise na, the loss function can be expressed as: 

Tt Tfl 

U=1S=1 

n n m 

+ |(E(>." + Er«ii" + Eiis.i7. (20) 

U=1 U—1 S=1 

The minimization of this loss function can typically be solved 
by the gradient descent algorithm used in || 6 l or the stochastic 
gradient descent algorithm used in IfTTl . Due to space limits, 
we omit the algorithmic description here and refer interested 
readers to our supplementary report (see our project page). 





TABLE I. Statistics of QoS Data 
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QoS 

#Users 

#Services 

Range 

Average 

Std. 

RT (sec) 

339 

5,825 

0 20 

0.909 

1.973 

TP (kbps) 

339 

5,825 

0 - 1000 

47.562 

110.797 


After obtaining the solutions with respect to bg, Uu, and Sg, 
we can make the following QoS prediction: 

fus = bs + U^Ss. ( 21 ) 


At last, as with P-UIPCC, a post-processing step in 
Equ. (141 is required to recover the prediction result fus to 
the true prediction value Rus- For both P-UIPCC and P-PMF, 
after obtaining the predicted QoS values of all the available 
Web services, we can recommend to users those services with 
top-ranked QoS values. 


V. Evaluation 

This section describes the experiments and the correspond¬ 
ing results of evaluating our privacy-preserving QoS prediction 
approaches. In particular, we intend to answer the following 
research questions. 

RQl: What is the effect of data obfuscation? 

RQ2: What is the accuracy of P-UIPCC and P-PMF? 

RQ3: What is the tradeoff between accuracy and privacy? 

RQ4: What is the effect of distribution of random noises on 

prediction accuracy? 


A. Experimental Setup 

In our experiments, we focus mainly on two representative 
QoS attributes: response time (RT) and throughput (TP). Re¬ 
sponse time measures the time duration between user sending 
out a request and receiving a response, while throughput stands 
for the data transmission rate of a user invoking a service. 

The experiments are conducted based on a publicly- 
available QoS dataset of real-world Web services ifTH . The 
dataset was collected in August 2009, providing a total of 
1,974,675 response time and throughput records of service 
invocations between 339 users and 5,825 Web services. The 
339 users are simulated by PlanetFat0 nodes distributed at 
30 countries, while the 5,825 real-world Web services are 
crawled from the Internet and are deployed at 73 countries. 
Table |T] provides a summary of the statistics of the data. 

In our experiments, we represent each type of QoS data 
by a 339-by-5825 QoS matrix with each entry denoting the 
observed response time/throughput of a specific invocation. 
In practice, the QoS matrix is very sparse because each user 
usually invokes only a handful of services. To simulate such 
data sparsity in our experiments, we randomly remove entries 
from the full data matrix and only keep a small density 
of historical QoS values. Data density = 10%, for example, 
indicates that each user invokes 10% of the services, or each 
service is invoked by 10% of the users. We leverage the 
preserved data entries for QoS prediction, and then use the 
removed QoS values as testing data for accuracy evaluation. 
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True QoS 




(a) a = 0 


(b) a = 0.5 


(c) a = 1 


Fig. 3. Obfuscated QoS True QoS (Rus) 


To quantize the accuracy of QoS prediction, we employ a 
standard error metric, MAE (Mean Absolute Error), which has 
been widely used in the existing work (e.g., |1T|, |0).: 

MAE = J2j _^\Rus - Rus\/N , ( 22 ) 

where Rus and Rus denote the observed QoS value and the 
corresponding predicted QoS value of the invocation between 
user u and service s. N is the total number of testing samples 
to be predicted, i.e., entries with lus = 0. A smaller MAE 
value indicates better prediction accuracy. 


B. Effect of Data Obfuscation (RQl) 

The aim of data obfuscation is to perturb the QoS data 
such that user privacy regarding the true QoS values can 
be preserved when performing collaborative analysis on the 
server. To understand the effect of data obfuscation made on 
QoS data (RQl), we compare the obfuscated QoS (r'^J against 
the corresponding true QoS data (Rus)- As an example, we ran¬ 
domly select a user from our dataset and provide three scatter 
plots by using the response time data of this user. The plots 
present the relationships between and Rus under different 
a settings, a is a parameter to determine the range of noises 
Cus used to obfuscate the data. Especially, when a = 0, the 
data obfuscation reduces to a z-score normalization process. 
Thus, Fig.j^a) shows linear dependence between and Rus- 
Z-score normalization is able to provide basic protection for 
user data where the mean and variance properties of QoS data 
are eliminated. The data after z-score normalization have a 
zero mean and unit variance. As a increases, the obfuscated 
data become more and more disordered. As shown in [^a) 
and (b), the linear correlation between and Rus is further 
eliminated. Consequently, a larger a indicates better protection 
for user data. Note that we have similar observations on the 
throughput data and thus omit the details here. 

C. Prediction Accuracy (RQ2) 

Data obfuscation is useful to perturb the QoS data for pre¬ 
serving user privacy, but it makes no sense without providing 
accurate prediction results. We evaluate the accuracy of our 
privacy-preserving QoS prediction approaches (P-UIPCC and 
P-PMF) based on the obfuscated QoS data, and compare them 
against the following baselines and counterpart approaches 
(RQ2). We emphasize that these existing approaches require 
users’ true QoS data and do not consider privacy issues. 

• UMEAN lO : This is a baseline approach that employs 
the average QoS value observed by a user (i.e., the row 
mean of R) to predict the unknown QoS of this user 
invoking other unused Web services. 


^PlanetLab ( https://www.planet-lab.orgi is an open platform for system and 
networking research, currently consisting of 1341 nodes at 654 global sites. 
























TABLE II. Parameter Settings 




Approach 

RT 

TP 

UIPCC 

fc 

10 

A 

: 0.1 

- 

k 

10 

A 

: 0.9 

- 

P-UIPCC 

k 

10 

A 

: 0.9 

a : 0.5 

k 

10 

A 

: 0.9 

OL : 0.5 

PMF 

d 

10 

7 

: 40 

- 

d 

10 

7 

: 800 

- 

P-PMF 

d 

10 

7 

: 12 

a : 0.5 

d 

10 

7 

: 12 

OL : 0.5 


TABLE III. Prediction Accuracy (w.r.t. MAE) 


QoS 

Approach 


Data Density 


10% 

15% 

20% 

25% 

30% 


UMEAN 


0.875 

0.875 

0.875 

0.875 

0.875 


IMEAN 


0.688 

0.683 

0.681 

0.680 

0.679 

RT 

UIPCC 


0.582 

0.501 

0.450 

0.427 

0.411 

PMF 


0.487 

0.452 

0.431 

0.418 

0.409 


P-UIPCC 


0.569 

0.537 

0.512 

0.495 

0.482 


P-PMF 


0.540 

0.504 

0.478 

0.458 

0.443 


UMEAN 


53.835 

53.816 

53.801 

53.804 

53.799 


IMEAN 


26.860 

26.716 

26.641 

26.593 

26.571 

TP 

UIPCC 


22.370 

20.219 

18.928 

17.891 

17.080 

PMF 


15.994 

14.670 

13.924 

13.405 

13.117 


P-UIPCC 


23.572 

21.324 

19.754 

18.681 

17.953 


P-PMF 


20.702 

18.451 

17.351 

16.634 

16.063 


• IMEAN O: Likewise, this baseline approach employs 
the observed average QoS value of a Web service (i.e., 
the column mean of R) to predict the unknown QoS 
of other users invoking this Web service. 

• UIPCC [[2), IfTll : This is a hybrid approach that 
combines both user-based CF approach (UPCC) and 
item-based CF approach (IPCC) to make full use 
of the historical information from similar users and 
services for QoS prediction. UIPCC typically performs 
better than either UPCC or IPCC. 

• PMF (hi: This is a widely-used implementation of 
the matrix factorization model ca, which have been 
introduced to QoS prediction in |i6i|. 


For fair comparisons, we use the original parameters for 
the counterpart approaches, as specified in the related work, 
because we experiment on the same dataset. To make it consis¬ 
tent with these settings, most parameters of our approaches are 
set the same with them (e.g., k — 10 for top-k neighbours in 
UIPCC and P-UIPCC). However, since both P-UIPCC and P- 
PMF work on obfuscated (normalized) data, we set different A 
and 7 values. The detailed parameters are specified in Table |ll| 
We use a = 0.5 in this experiment and study the effect of a in 
Section V-D| Additionally, we vary the data density from 10% 
to 30% at a step increase of 5%. Each approach is performed 
20 times under each data density (with different random seeds), 
and the average MAE results are reported. 


Table III provides the results of prediction accuracy with 
comparisons among different approaches. The results show 
that, while both of our approaches preserve decent privacy 
by data obfuscation (a = 0.5), they still perform much bet¬ 
ter than the baselines including UMEAN and IMEAN, and 
achieve comparable accuracy with the counterpart approaches 
including UIPCC and PME. In particular, P-UIPCC sometimes 
performs better than UIPCC (e.g., 0.569 vs 0.582), which can 
be attributed to the use of z-score normalization. Moreover, 
we observe that even working on obfuscated data, P-PME 
mostly performs better than UIPCC. These encouraging results 
indicate the effectiveness of privacy-preserving approaches. In 
addition, we can see that the accuracy of these QoS prediction 
approaches improves with the increase in data density. 



(a) Response Time 


Eig. 4. Tradeoff between Accuracy 



(b) Throughput 


Privacy 



(a) P-UIPCC (b) P-PME 


Eig. 5. Impact of Matrix Density 


D. Tradeoff between Accuracy and Privacy (RQ3) 

Whereas the goal of our work is to achieve both accuracy 
and privacy, there is indeed a tradeoff between them. At 
one extreme, users can provide true QoS data to obtain the 
most accurate QoS prediction results yet they lose privacy. 
At another extreme, users can submit totally false QoS data to 
preserve privacy but bad prediction results will be returned. To 
study such tradeoff between accuracy and privacy {RQ3), we 
consider the effect of noise range a on prediction accuracy, 
because a larger a indicates better protection of privacy. 
Specifically, in this experiment, we set data density = 10% 
and vary a from 0 to 1 at a step increase of 0.1. Accordingly, 
we obtain the prediction accuracy under each a value. 

Eig. 1^ presents the experimental results corresponding to 
response time and throughput, respectively. We can observe 
that both P-UIPCC and P-PME degrade in accuracy (i.e., MAE 
increases) when a becomes larger, because the utility of data is 
less preserved. However, when a is small, e.g., less than 0.6 in 
Eig.j^a), our privacy-preserving approaches are more accurate 
than UIPCC. Even a is as large as 1, which is the variance 
of data after z-score normalization, the prediction accuracy is 
much better than the baselines (UMEAN and IMEAN). As 
a result, a balance needs to be made between the accuracy 
and privacy that a user wants to achieve. Additionally, we 
find that PME and P-PME consistently outperform UIPCC and 
P-UIPCC. This suggests the superior effectiveness of model- 
based approaches in capturing the latent structure of the QoS 
data, which conforms to the results reported in |i6i|. 

E. Effect of Distribution of Random Noises (RQ4) 

In addition to the impact of noise range, a data random¬ 
ization scheme is also subject to the choice of the distribution 
of random noises that are used for data obfuscation. In all of 
the above experiments, the random noises are generated from 
a uniform distribution located in [—a, a]. In contrast, in this 
experiment, we consider a Gaussian distribution ff{0, a) with 
a mean of zero and a standard deviation of a. Compared to a 

























































































uniform distribution, random noises generated from a Gaussian 
distribution are unevenly distributed. To investigate the effect 
of distribution of random noises {RQ4), we vary the a value 
and compare the prediction accuracy of P-UIPCC and P-PMF 
with different settings on the distribution of random noises. 

Fig. presents the results of the accuracy comparison. 
We can observe that, for both P-UIPCC and P-PMF, the 
randomization scheme with uniform noises performs better 
than the scheme with Gaussian noises. In particular, the per¬ 
formance differs significantly between the two randomization 
schemes under a large a setting. The results imply that the 
distribution of random noises is a crucial factor for determining 
the performance of our privacy-preserving approaches. 

VI. Conclusion 

Privacy is a practical issue to be addressed for QoS- 
based Web service recommendation. This paper makes an 
initial effort to deal with the privacy-preserving Web service 
recommendation problem. We propose a generic privacy¬ 
preserving framework with the use of data obfuscation tech¬ 
niques, under which users can gain greater control on their 
data and rely less on the recommender system for privacy 
protection. We further develop two privacy-preserving QoS 
prediction approaches based on this framework, namely P- 
UIPCC and P-PMF, as representatives of neighbourhood-based 
CF approaches and model-based CF approaches respectively. 
To evaluate the effectiveness of P-UIPCC and P-PMF, we 
conduct experiments on a publicly-available QoS dataset of 
real-world Web services. The experimental results show that 
our privacy-preserving QoS prediction approaches can still 
descent prediction accuracy compared with the counterpart 
approaches. We hope that the encouraging results achieved in 
this initial work can inspire more research efforts on privacy¬ 
preserving Web service recommendation. 
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